KoKo: an L1 Learner Corpus for German
نویسندگان
چکیده
We introduce the KoKo corpus, a collection of German L1 learner texts annotated with learner errors, along with the methods and tools used in its construction and evaluation. The corpus contains both texts and corresponding survey information from 1,319 pupils and amounts to around 716,000 tokens. The evaluation of the performed transcriptions and annotations shows an accuracy of orthographic error annotations of approximately 80% as well as high accuracies of transcriptions (> 99%), automatic tokenisation (> 99%), sentence splitting (> 96%) and POS-tagging (> 94%). The KoKo corpus will be published at the end of 2014. It will be the first accessible linguistically annotated German L1 learner corpus and a valuable source for research on L1 learner language as well as for teachers of German as L1, in particular with regards to writing skills.
منابع مشابه
An Extended Version of the KoKo German L1 Learner Corpus
English. This paper describes an extended version of the KoKo corpus (version KoKo4, Dec 2015), a corpus of written German L1 learner texts from three different German-speaking regions in three different countries. The KoKo corpus is richly annotated with learner language features on different linguistic levels such as errors or other linguistic characteristics that are not deficit-oriented, an...
متن کاملVerb Second in Advanced L2 English: A Learner Corpus Study
The present study examines the interface between syntax and discourse-pragmatics in the production of verb second (V2) structures in a corpus of English texts by advanced L1 German and Dutch speakers. The evidence shows that the residual V2 produced by the learner groups studied is the result of a deficit at the interface rather than the transfer of narrow V2 syntax per se. The analysis offered...
متن کاملAnnotating Orthographic Target Hypotheses in a German L1 Learner Corpus
NLP applications for learners often rely on annotated learner corpora. Thereby, it is important that the annotations are both meaningful for the task, and consistent and reliable. We present a new longitudinal L1 learner corpus for German (handwritten texts collected in grade 2–4), which is transcribed and annotated with a target hypothesis that strictly only corrects orthographic errors, and i...
متن کاملWhat’s Hard? Quantitative Evidence for Difficult Constructions in German Learner Data
1. Introduction Our study is concerned with the identification of 'difficult' structures in the acquisition of a foreign language, which will shed light on theoretical considerations of L2 processing. We argue that – compared to simple vocabulary items or abstract syntactic patterns – structures that contain lexical material as well as categorial variables are especially difficult to acquire. T...
متن کاملWhat motivates extra-rising patterns in L2 French: Acquisition factors or L1 transfer?
Learners of L2 French, be they German or Spanish, produce an extra-rising melodic movement (T*HH%) at the right edge of non-final IPs, whereas French native speakers do not produce such form. From the analyses of a large data set extracted from a learner corpus, it appears that this non-native tonal pattern could not be attributed to an L1 transfer. Different factors are thus explored in order ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014